Modeling phone correlation for speaker adaptive speech recognition
نویسندگان
چکیده
Information of phone relationships is regarded as acting an important role in speech recognition. It has been successfully exploited in many speaker adaptation approaches. In this paper, we propose a new approach, named Phone Pair Model (PPM) re-scoring, to utilize phone relationships for speaker-adaptive speech recognition. PPM re-scoring approach does not really adapt model parameters to a new speaker. It just uses some pre-registered phones' samples from the speaker being recognized, to re-calculate the likelihood of phones that has been calculated on conventional phone HMMs, resulting in a more correct recognition result. Additionally, it can deal with not only inter-speaker acoustic variations but also intra-speaker acoustic variations adequately. Results of two recognition experiments, one using phone HMMs only and the other incorporating phone HMMs with the PPMs, showed that even by using only a few vowel samples as the pre-registered phones, PPM re-scoring approach brought an increase in recognition rate .
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملA comparison of normalization and training approaches for ASR-dependent speaker identification
In this paper we discuss a speaker identification approach, called ASR-dependent speaker identification, that incorporates phonetic knowledge into the models for each speaker. This approach differs from traditional methods for performing textindependent speaker identification, such as global Gaussian mixture modeling, that typically ignore the phonetic content of the speech signal. We introduce...
متن کاملA Comparison of Normalization and Training Approaches for ASR-Dependent Speaker Identification1
In this paper we discuss a speaker identification approach, called ASR-dependent speaker identification, that incorporates phonetic knowledge into the models for each speaker. This approach differs from traditional methods for performing textindependent speaker identification, such as global Gaussian mixture modeling, that typically ignore the phonetic content of the speech signal. We introduce...
متن کاملPhone Adaptive Training for Speaker Diarization
The linguistic content of a speech signal is a source of unwanted variation which can degrade speaker diarization performance. This paper presents our latest work to reduce its impact. The new approach, referred to as Phone Adaptive Training (PAT), is analogous to speaker adaptive training used in automatic speech recognition. We report an oracle experiment which shows that PAT has the potentia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000